Practical application of one-pass Viterbi algorithm in tokenization and part-of-speech tagging
نویسندگان
چکیده
Sentence word segmentation and Part-OfSpeech (POS) tagging are common preprocessing tasks for many Natural Language Processing (NLP) applications. This paper presents a practical application for POS tagging and segmentation disambiguation using an extension of the one-pass Viterbi algorithm called Viterbi-N. We introduce the internals of the developed system, which is based on lattices and a stochastic model built using second order Hidden Markov Models (HMMs). Also, we present the results of an evaluation process and the analysis of the error cases. The results achieved suggest that the Viterbi-N algorithm applied on lattices allows POS tagging and segmentation disambiguation to be accomplished in a common process. Although the tests were done for the Galician language, the solution proposed could be easily exported to other
منابع مشابه
برچسبگذاری ادات سخن زبان فارسی با استفاده از مدل شبکۀ فازی
Part of speech tagging (POS tagging) is an ongoing research in natural language processing (NLP) applications. The process of classifying words into their parts of speech and labeling them accordingly is known as part-of-speech tagging, POS-tagging, or simply tagging. Parts of speech are also known as word classes or lexical categories. The purpose of POS tagging is determining the grammatical ...
متن کاملTagging Parts of Speech
This paper focuses on the task of tagging text with their parts of speech. The methodology chosen for this task is the Maximum Entropy based Model and although complex will only be explained briefly. More importantly, the focus will center on the differences in performance of the maxent model with varying feature sets compared to the baseline model. One problem highlighted in part-of-speech tag...
متن کاملPorting a Stochastic Part-of-Speech Tagger to Swedish
A b stract The Xerox Part-of-Speech Tagger (XPOST) claims to be practical. One aspect of practicality as defined here is reusability. Thus it is meant to be easy to port XPOST to a new language. To test this, XPOST was ported to Swedish. This port is described and evaluated. In previous work on part-of-speech tagging, a practical part-of-speech tagger was defined as one with the following set o...
متن کاملPart of Speech Tagging for English Text Data
A variety of Natural Language Processing (NLP) tasks, such as named entity recognition, stemming and question answering, benefit from knowledge of the words syntactic categories or Partof-Speech (POS) [4][6]. POS taggers have been successfully applied to assign a single best POS to every word in a corpus [2][5][12]. This paper reports on the implementation and empiric comparison of three superv...
متن کاملPart of Speech Tagging Using Statistical Approach for Nepali Text
Abstract—Part of Speech Tagging has always been a challenging task in the era of Natural Language Processing. This article presents POS tagging for Nepali text using Hidden Markov Model and Viterbi algorithm. From the Nepali text, annotated corpus training and testing data set are randomly separated. Both methods are employed on the data sets. Viterbi algorithm is found to be computationally fa...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007